Machine learning overview
Loss functions for numerical ouput:
\[ L(Y,\hat{f}(X))=\begin{cases} |Y-\hat{f}(X)| \\ (Y-\hat{f}(X))^2 \end{cases} \]
Loss functions for categorical ouput: \[ L(Y,\hat{f}(X))=\begin{cases} I(Y \ne\hat{f}(X)) \\ -2{log(\hat p_k(X))} \end{cases} \]
\[\frac 1 n \sum_{i=1}^n L(y_i,\hat{f}(x_i))\]
\[\mathit{E}(L(Y,\hat{f}(X))\]
(qui metto la referenza sulle formule)
(The Elements of Statistical Learning. Hastie T, Tibshirani R, Friedman. Springer 2008)
There are two common approaches to estimate test error:
We can directly estimate the test error, using either a validation set approach or a cross-validation approach.
We can indirectly estimate test error by making an adjustment to the training error to account for the bias due to overfitting (e.g. AIC, BIC,…)
In a data-rich situation, the best approach for both problems is to randomly divide the dataset into three parts:
Left panel shows single split; right panel shows multiple splits